13 research outputs found

    Misclassification analysis for the class imbalance problem

    Get PDF
    In classification, the class imbalance issue normally causes the learning algorithm to be dominated by the majority classes and the features of the minority classes are sometimes ignored. This will indirectly affect how human visualise the data. Therefore, special care is needed to take care of the learning algorithm in order to enhance the accuracy for the minority classes. In this study, the use of misclassification analysis is investigated for data re-distribution. Several under-sampling techniques and hybrid techniques using misclassification analysis are proposed in the paper. The benchmark data sets obtained from the University of California Irvine (UCI) machine learning repository are used to investigate the performance of the proposed techniques. The results show that the proposed hybrid technique presents the best performance in the experiment

    Deep Over-sampling Framework for Classifying Imbalanced Data

    Full text link
    Class imbalance is a challenging issue in practical classification problems for deep learning models as well as traditional models. Traditionally successful countermeasures such as synthetic over-sampling have had limited success with complex, structured data handled by deep learning models. In this paper, we propose Deep Over-sampling (DOS), a framework for extending the synthetic over-sampling method to exploit the deep feature space acquired by a convolutional neural network (CNN). Its key feature is an explicit, supervised representation learning, for which the training data presents each raw input sample with a synthetic embedding target in the deep feature space, which is sampled from the linear subspace of in-class neighbors. We implement an iterative process of training the CNN and updating the targets, which induces smaller in-class variance among the embeddings, to increase the discriminative power of the deep representation. We present an empirical study using public benchmarks, which shows that the DOS framework not only counteracts class imbalance better than the existing method, but also improves the performance of the CNN in the standard, balanced settings

    A novel feature selection-based sequential ensemble learning method for class noise detection in high-dimensional data

    Get PDF
    © 2018, Springer Nature Switzerland AG. Most of the irrelevant or noise features in high-dimensional data present significant challenges to high-dimensional mislabeled instances detection methods based on feature selection. Traditional methods often perform the two dependent step: The first step, searching for the relevant subspace, and the second step, using the feature subspace which obtained in the previous step training model. However, Feature subspace that are not related to noise scores and influence detection performance. In this paper, we propose a novel sequential ensemble method SENF that aggregate the above two phases, our method learns the sequential ensembles to obtain refine feature subspace and improve detection accuracy by iterative sparse modeling with noise scores as the regression target attribute. Through extensive experiments on 8 real-world high-dimensional datasets from the UCI machine learning repository [3], we show that SENF performs significantly better or at least similar to the individual baselines as well as the existing state-of-the-art label noise detection method

    Learning From Multiple Experts:Self-paced Knowledge Distillation for Long-Tailed Classification

    Get PDF
    In real-world scenarios, data tends to exhibit a long-tailed distribution, which increases the difficulty of training deep networks. In this paper, we propose a novel self-paced knowledge distillation framework, termed Learning From Multiple Experts (LFME). Our method is inspired by the observation that networks trained on less imbalanced subsets of the distribution often yield better performances than their jointly-trained counterparts. We refer to these models as 'Experts', and the proposed LFME framework aggregates the knowledge from multiple 'Experts' to learn a unified student model. Specifically, the proposed framework involves two levels of adaptive learning schedules: Self-paced Expert Selection and Curriculum Instance Selection, so that the knowledge is adaptively transferred to the 'Student'. We conduct extensive experiments and demonstrate that our method is able to achieve superior performances compared to state-of-the-art methods. We also show that our method can be easily plugged into state-of-the-art long-tailed classification algorithms for further improvements.Comment: ECCV 2020 Spotligh

    Data cleaning for classification using misclassification analysis

    Get PDF
    In most classification problems, sometimes in order to achieve better results, data cleaning is used as a preprocessing technique. The purpose of data cleaning is to remove noise, inconsistent data and errors in the training data. This should enable the use of a better and representative data set to develop a reliable classification model. In most classification models, unclean data could sometime affect the classification accuracies of a model. In this paper, we investigate the use of misclassification analysis for data cleaning. In order to demonstrate our concept, we have used Artificial Neural Network (ANN) as the core computational intelligence technique. We use four benchmark data sets obtained from the University of California Irvine (UCI) machine learning repository to investigate the results from our proposed data cleaning technique. The experimental data sets used in our experiment are binary classification problems, which are German credit data, BUPA liver disorders, Johns Hopkins Ionosphere and Pima Indians Diabetes. The results show that the proposed cleaning technique could be a good alternative to provide some confidence when constructing a classification model

    Classification of imbalanced data by combining the complementary neural network and SMOTE algorithm

    Get PDF
    In classification, when the distribution of the training data among classes is uneven, the learning algorithm is generally dominated by the feature of the majority classes. The features in the minority classes are normally difficult to be fully recognized. In this paper, a method is proposed to enhance the classification accuracy for the minority classes. The proposed method combines Synthetic Minority Over-sampling Technique (SMOTE) and Complementary Neural Network (CMTNN) to handle the problem of classifying imbalanced data. In order to demonstrate that the proposed technique can assist classification of imbalanced data, several classification algorithms have been used. They are Artificial Neural Network (ANN), k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM). The benchmark data sets with various ratios between the minority class and the majority class are obtained from the University of California Irvine (UCI) machine learning repository. The results show that the proposed combination techniques can improve the performance for the class imbalance problem

    Comparing the performance of different neural networks for binary classification problems

    No full text
    Classification problem is a decision making task where many researchers have been working on. There are a number of techniques proposed to perform classification. Neural network is one of the artificial intelligent techniques that has many successful examples when applying to this problem. This paper presents a comparison of neural network techniques for binary classification problems. The classification performance obtained by five different types of neural networks for comparison are Back Propagation Neural Network (BPNN), Radial Basis Function Neural Network (RBFNN), General Regression Neural Network (GRNN), Probabilistic Neural Network (PNN), and Complementary Neural Network (CMTNN). The comparison is done based on three benchmark data sets obtained from UCI machine learning repository. The results show that CMTNN typically provide better classification results when comparing to techniques applied to binary classification problems

    Multiclass Imbalanced Classification Using Fuzzy C-Mean and SMOTE with Fuzzy Support Vector Machine

    No full text
    A hybrid sampling technique is proposed by combining Fuzzy C-Mean Clustering and Synthetic Minority Oversampling Technique (FCMSMT) for tackling the imbalanced multiclass classification problem. The mean number of classes is used as the number of instances for applying undersampling and oversampling. Using the mean as the fixed number of the required instances for each class can prevent the within-class imbalance data from being eliminated erroneously during undersampling. This technique can decrease both within-class and between-class errors, and thus can increase the classification performance. The study was conducted using eight benchmark datasets from KEEL and UCI repositories and the results were compared against three major classifiers based on G-mean and AUC measurements. The results reveal that the proposed technique could handle most of the multiclass imbalanced datasets used in the experiments for all classifiers and retain the integrity of the original data
    corecore